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E&EEACE 

This  Memorandum  presents  some  mathematical  results 
pertaining  to  the  study  of  interval  graphs  and  their 
application  to  a  problem  concerning  the  structure  of  genes. 
The  mathematical  statement  of  the  particular  genetic  prob¬ 
lem  considered  here  is:  Given  a  large  number  of  mutant 
genes,  together  with  intersection  data  on  pairs  of  mutants, 
then  the  problem  is  to  decide  whether  this  information  is 
compatible  with  a  linear  model  of  the  gene.  This  means 
that  one  must  determine  whether  the  graph  of  the  intersection 
data  is  an  interval  graph,  and  this  can  be  done  by  con¬ 
sidering  a  certain  incidence  matrix  associated  with  the 
graph. 

The  research  presented  in  this  Memorandum  is  an 
example  of  the  basic  supporting  studies  in  mathematics 
conducted  by  RAND  mathematicians. 


SUMMARY 


Let  A  -  (fljj)  be  an  nr-by-n  matrix  whose  entries  a^ 
are  all  either  0  or  1.  For  certain  applications  it  is 
of  interest  to  know  whether  or  not  there  is  an  m-by-rn 
permutation  matrix  P  such  that  the  l's  in  each  column  of 
PA  occur  in  consecutive  positions.  In  this  note  certain 
results  having  relevance  to  this  problem  are  stated. 

Proofs  of  these  results  together  with  computationally 
effective  algorithms  for  deciding  the  question  are  to 
be  published  elsewhere. 

The  problem  formulated  above  is  directly  related  to 
that  of  determining  whether  a  given  finite,  undirected 
graph  is  an  interval  graph.  Although  necessary  and  suffi¬ 
cient  conditions  are  known  for  the  solution  of  this  latter 
problem,  the  approach  used  here  is  quite  different  from 
those  used  heretofore  and  seems  to  lead  to  highly  efficient 
algorithms  not  only  for  resolving  the  question,  but  also 
for  producing  a  representative  set  of  intervals  in  the 
affirmative  case. 
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Let  A  ■  (fijj)  be  an  m-by-n  matrix  whose  entries  a^ 
are  all  either  0  or  1.  For  certain  applications,  one  of 
which  will  be  discussed  below,  it  is  of  interest  to  know 
whether  there  is  an  m-by-rn  permutation  matrix  P  such  that 
the  l's  in  each  column  of  PA  occur  in  consecutive  positions. 
In  this  note  we  state  certain  results  that  have  relevance 
for  this  problem.  Proofs  of  these,  together  with  an  effi¬ 
cient  computational  method  for  deciding  the  question  in  any 
given  case,  will  be  published  elsewhere. 

The  problem  posed  above  includes  that  of  determining 
whether  a  given  finite,  undirected  graph  is  an  interval 
graph.  The  study  of  interval  graphs  [2,  3,  4,  5]  was 
stimulated  in  part  by  an  application  concerning  the  fine 
structure  of  genes.  A  basic  genetic  problem,  discussed 
in  [1],  is  to  decide  whether  or  not  the  sub-elements  of 
genes  are  linked  together  in  a  linear  order.  A  way  of 
approaching  this  problem  is  also  described  in  [1].  Briefly, 
it  is  as  follows.  For  certain  microorganisms,  there  are 
a  standard  form  and  mutants,  the  latter  arising  from  the 
standard  form  by  alteration  of  some  connected  portion  of 
the  genetic  structure.  Experiments  can  be  devised  for 
determining  whether  or  not  the  blemished  parts  of  two 
mutant  genes  intersect  or  not.  Thus  the  mathematical 
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problem  becomes:  Given  a  large  number  of  mutants  together 
with  intersection  data  on  pairs  of  mutants,  to  decide  whether 
this  information  is  compatible  with  a  linear  model  of  the 
gene.  If  one  represents  the  intersection  data  by  a  graph 
(two  mutant  genes,  i.e.,  vertices,  being  joined  by  an  edge 
if  their  blemished  portions  intersect),  the  problem  is  to 
decide  whether  this  graph  is  an  interval  graph. 

2.-.  A  .BASIfi-IHEflBEM 

We  say  that  a  (0,  l)-matrix  A  has  the  consecutive  l's 
property  (for  columns)  if  there  is  a  permutation  matrix  P 
such  that  the  l's  in  each  column  of  PA  occur  consecutively. 
The  first  question  that  naturally  arises  is  how  much  in¬ 
formation  about  A  is  needed  to  decide  whether  it  has  the 
property  or  not.  Do  we  need  to  know  A  itself,  or  will 
something  less  suffice?  Theorem  2.1  below  provides  a 

partial  answer  to  this  question;  it  shows  that  a  knowledge 

T  T 

of  the  matrix  A  A  is  enough.  Here  A  denotes  the  transpose 
of  A. 

Theorem  g.l.  Let  A  and  B  be  (0,  l)-matrices  satisfying 
(2.1)  ATA  -  BTB  . 

Then  either  both  A  and  B  have  the  consecutive  l's  property 
or  neither  does.  Moreover,  if  A  and  B  have  the  same  number 
of  rows  and  A  has  the  property,  then  there  is  a  permutation 
P  such  that  B  *  PA. 
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The  first  part  of  Theorem  2.1  follows  easily  from  the 
second.  The  second  assertion  can  be  proved  by  induction  on 
the  number  of  columns  of  A. 

In  view  of  Theorem  2.1,  it  would  be  interesting  to 
know  conditions  on  A  A  in  order  that  A  have  the  consecutive 
l's  property.  Later  on  we  shall  state  a  theorem  which 
reduces  this  question  to  the  consideration  of  (0,  l)-matrices 
having  connected  "overlap  graphs."  For  such  matrices,  there 
is  a  simple  construction  for  testing  the  property,  out  we 
do  not  know  explicit  necessary  and  sufficient  conditions . 

3-  THE  OVERLAP  GRAPH  AND  COMPONENT  GRAPH 

Let  a  and  b  be  (0,  l)—vectors  having  m  components. 

Their  inner  product  a-b  satisfies 

(3-1)  0  <  a-b  <  min  (a«a,  b-b)  . 

If  strict  inequality  holds  throughout  (3-1),  we  say  that 
a  and  b  overlap.  We  also  say  that  a  contains  b  if 

(3.2)  a.b  =  b-b  . 


Now  let  A  be  an  nr-by—n  (0,  l)-matrix  having  column 
vectors  a^ ,  j  *  1,  2,  . . . ,  n.  It  is  convenient,  and  presents 
no  loss  of  generality  in  studying  the  consecutive  l's  property, 
to  assume  that  a^  4“  0,  j  *  1,  2,  ...,  n,  and  that  a^  4s 
for  i  +  j  •  We  refer  to  such  an  A  as  proper. 
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There  are  various  graphs  one  can  associate  with  a 
(0,  l)-matrix  A  that  are  meaningful  insofar  as  the  consecu¬ 
tive  l's  property  is  concerned.  We  describe  two  such,  one 
baing  an  undirected  graph,  the  other  a  directed  graph.  The 
first  of  these  is  obtained  from  A  by  taking  vertices 
Xp  *2>  •••>  xn  corresponding  to  the  columns  a^,  a^y  •••>  an 
of  A  and  putting  in  undirected  edges  (x^,  Xj)  corresponding 
to  overlapping  column  vectors  a^  and  a^  .  We  call  this  the 
overlap  graph  of  A  and  denote  it  by  &  ■  h  (A) .  The  overlap 
graph  of  A  splits  up  into  connected  components  &2>  •  •  •  > 

and  this  decomposition  yields  a  corresponding  partition  of 
A  into  nr-rowed  submatrices  Ap  A^  •  •  . ,  A^.  We  now  form 
a  second  (directed)  graph  by  taking  vertices  X^,  X2,  . .  . ,  Xp 
corresponding  to  these  submatrices  and  putting  in  an  edge 
[X^,  Xj  ]  directed  from  X^  to  Xj  if  there  is  a  column  vector 
a  of  A.  and  a  column  vector  b  of  A.  such  that  a  contains 

1  j 

b.  We  call  this  directed  graph  the  component  graph  of  A 
and  denote  it  by  X)  -  £f(A). 

The  following  theorem  may  be  established  in  a  straight¬ 
forward  manner. 

Ib£0£sm_l^i-  The  <?pmponent_gra£h  <0(A)  of  a  proper 
(0,  l)-matrix  A  is  acyclic  and  transitive. 

That  is,  A)  contains  no  directed  cycles,  and  if 
[X,  Yj  and  [Y,  Z]  are  edges,  then  [X,  Z]  is  an  edge.  Thus 
J*(  A)  is  the  graph  of  a  partial  ordering.  This  partial 
ordering  of  components  of  B  (A)  is  special  in  the  sense 


that  an  element  can  have  at  most  one  immediate  predecessor. 
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Thus  if  we  omit  from  JJ  every  edge  whose  existence  is  implied 
by  transitivity,  the  resulting  graph  is  simply  a  collection 
of  rooted  trees . 

The  structure  of  the  component  graph  <£f(k)  is  useful 
in  establishing  the  decomposition  theorem  of  the  next 
section. 

4.  A  DECOMPOSITION  THEOREM 

For  an  arbitrary  (0,  l)-matrix  A,  we  can  rearrange 
columns  and  write 

(4-1)  A  ■  (Ap  A^  Ap) , 

where  each  submatrix  AfcJ  k  -  1,  2,  . ..,  p,  corresponds  to  a 
component  of  the  overlap  graph  (A) .  We  term  (4.1)  an 
overlap  decomposition  of  A,  and  refer  to  the  submatrices 
A^  as  components  of  A-  If  A  has  just  one  component,  we  say 
that  A  is  connected. 

Theorem  JL1-  A  (0,  l) -matrix  A  ha&-th£ -consecutive 
l's  property  if  and  only  if  each  of  its  components  has  the 
EToacrty. 

Necessity  in  Theorem  4.1  is  of  course  trivial.  Suf¬ 
ficiency  can  be  established  by  induction  on  the  number  of 
components  of  a  proper  A.  The  induction  step  proceeds  by 
deleting  a  component  of  A  which  corresponds  to  a  minimal 
element  in  the  partial  ordering  given  by 

Theorem  4.1  effectively  solves  the  problem  posed  in 
Section  1,  since  one  can  describe  a  very  simple  and  efficient 


procedure  for  testing  whether  or  not  a  connected  matrix  has 

the  consecutive  l's  property.  Moreover,  having  arranged  each 

individual  component  of  a  disconnected  A  so  that  its  l's 

appear  consecutively  in  each  column,  the  proof  of  Theorem 

4.1  indicates  how  to  fit  these  components  together  so  as  to 

yield  a  permuted  form  of  A  which  has  consecutive  l's  in  each 

column.  The  entire  process  is  computationally  efficient, 

o 

requiring  no  more  than  0(n  )  steps  if  A  has  n  columns. 

5 .  APPLICATION  TO  INTERVAL  GRAPHS 

A  graph  ff  (finite,  undirected,  without  multiple  edges 
or  loops)  is  an  interval  graph  provided  b  can  be  represented 
as  the  intersection  graph  of  a  set  of  intervals  on  the  real 
line.  The  theorems  and  methods  described  in  preceding  sections 
can  be  applied  to  the  problem  of  determining  when  a  graph 
b  is  an  interval  graph  by  considering  a  certain  incidence 
matrix  which  specifies  if  .  We  term  this  incidence  matrix  the 
dominant— clique— vs  .—vertex  matrix,  and  define  it  as  follows. 
First  of  all,  a  clique  in  b  is  a  set  of  vertices,  every 
two  of  which  are  joined  by  an  edge.  We  may  partially  order 
the  set  of  all  cliques  of  b  by  inclusion.  The  maximal 
elements  in  this  ordering  will  be  termed  dominant  cliques. 

Since  two  vertices  of  b  are  joined  by  an  edge  if  and  only 
if  they  belong  to  some  dominant  clique,  the  dominant-clique- 
vs.— vertex  incidence  matrix  characterizes  b  • 

Theorem  5.1.  A  graph  {f  is  an  Interval  graph  if  and 
only  if  the  dominant-cliaue-vs  .-vertex  incidence  matrix  of 


has  the  consecutive  l's  property. 


We  also  note  that  an  Interval  graph  is  necessarily  a 
rigid-circuit  graph  [2],  and  that  one  can  describe  a  simple 
method  to  test  for  the  rigid-circuit  property.  (A  graph 
is  a  rigid— circuit  graph  if  every  circuit  with  more  than 
three  vertices  has  a  chord.  The  test  is  based  on  the  known 
fact  that  such  a  graph  always  contains  slmplicial  vertices, 
a  simplicial  vertex  being  one  whose  neighboring  vertices 
are  a  clique  [2],  [3].)  If  the  test  succeeds,  the  method 
automatically  generates  all  dominant  cliques.  Thus  to 
discover  if  ft  is  an  interval  graph,  one  can  first  apply 
an  easy  test  for  the  rigid-circuit  property,  and  then  test 
the  resulting  dominant-clique— vs  .—vertex  incidence  matrix 
for  the  consecutive  l's  property. 
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